Automatic Classification of Electronic Music and Speech / Music Audio Content

نویسنده

  • AUSTIN C. CHEN
چکیده

Automatic audio categorization has great potential for application in the maintenance and usage of large and constantly growing media databases; accordingly, much research has been done to demonstrate the feasibility of such methods. A popular topic is that of automatic genre classification, accomplished by training machine learning algorithms. However, " electronic " or " techno " music is often misrepresented in prior work, especially given the recent rapid evolution of the genre and subsequent splintering into distinctive subgenres. As such, features are extracted from electronic music samples in an experiment to categorize song samples into three subgenres: deep house, dubstep, and progressive house. An overall classification performance of 80.67% accuracy is achieved, comparable to prior work. Similarly, many past studies have been conducted on speech/music discrimination due to the potential applications for broadcast and other media, but it remains possible to expand the experimental scope to include samples of speech with varying amounts of background music. The development and evaluation of two measures of the ratio between speech energy and music energy are explored: a reference measure called speech-to-music ratio (SMR) and a feature which is an imprecise estimate of SMR called estimated voice-to-music ratio (eVMR). eVMR is an objective signal measure computed by taking advantage of broadcast mixing techniques in which vocals, unlike most instruments, are typically placed at stereo center. Conversely, SMR is a hidden variable defined by the relationship between the powers of portions of audio attributed to speech and music. It is shown that eVMR is predictive of SMR and can be combined with state-of-the-art features in order to improve performance. For evaluation, this new metric is applied in speech/music (binary) classification, speech/music/mixed (trinary) classification, and iii a new speech-to-music ratio estimation problem. Promising results are achieved, including 93.06% accuracy for trinary classification and 3.86 dB RMSE estimation of the SMR. iv Acknowledgments

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Identification and Classification of the Iranian Traditional Music Scales (Dastgāh) and Melody Models (Gusheh): Analytical and Comparative Review on Conducted Research

Background and Aim: Automatic identification and classification of the Iranian traditional music scales (Dastgāh) and melody models (Gusheh) has attracted the attention of the researchers for more than a decade. The current research aims to review conducted researches on this area and consider its different approached and obstacles. Method: The research approach is content analysis and data col...

متن کامل

شناسایی خودکار سبک موسیقی

Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...

متن کامل

Musical genre classification of audio signals

Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the We...

متن کامل

Precision ��

As one of the key methods to extract content semantics and structure from audio, automatic audio classification, especially for a speech and a music, is valuable for content-based audio retrieval, video summary and retrieval, and spoken document retrieval, etc. Because hidden Markov model (HMM) can well model audio signal’s time statistical properties, a left-right discrete HMM is proposed to c...

متن کامل

Audio content analysis for online audiovisual data segmentation and classification

While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and class...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014